{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# The Canonical Time-series Characteristics (catch22) transform\n",
    "\n",
    "catch22\\[1\\] is a collection of 22 time series features extracted from the 7000+ present in the _hctsa_\\[2\\]\\[3\\] toolbox.\n",
    "A hierarchical clustering was performed on the correlation matrix of features that performed better than random chance to remove redundancy.\n",
    "These clusters were sorted by balanced accuracy using a decision tree classifier and a single feature was selected from the 22 clusters formed, taking into account balanced accuracy results, computational efficiency and interpretability.\n",
    "\n",
    "In this notebook, we will demonstrate how to use the catch22 transformer on the ItalyPowerDemand univariate and BasicMotions multivariate datasets. We also show catch22 used for classication with a random forest classifier.\n",
    "\n",
    "Both make use of the features implemented in the catch22 package (<https://github.com/chlubba/catch22>), where versions of catch22 for C and MATLAB are also available.\n",
    "\n",
    "#### References:\n",
    "\n",
    "\\[1\\] Lubba, C. H., Sethi, S. S., Knaute, P., Schultz, S. R., Fulcher, B. D., & Jones, N. S. (2019). catch22: CAnonical Time-series CHaracteristics. Data Mining and Knowledge Discovery, 33(6), 1821-1852.\n",
    "\n",
    "\\[2\\] Fulcher, B. D., & Jones, N. S. (2017). hctsa: A computational framework for automated time-series phenotyping using massive feature extraction. Cell systems, 5(5), 527-531.\n",
    "\n",
    "\\[3\\] Fulcher, B. D., Little, M. A., & Jones, N. S. (2013). Highly comparative time-series analysis: the empirical structure of time series and their methods. Journal of the Royal Society Interface, 10(83), 20130048."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:30:07.306937Z",
     "iopub.status.busy": "2020-12-19T14:30:07.306390Z",
     "iopub.status.idle": "2020-12-19T14:30:08.036353Z",
     "shell.execute_reply": "2020-12-19T14:30:08.036857Z"
    }
   },
   "source": [
    "from sklearn import metrics\n",
    "\n",
    "from sktime.classification.hybrid import Catch22ForestClassifier\n",
    "from sktime.datasets import load_basic_motions, load_italy_power_demand\n",
    "from sktime.transformations.panel.catch22_features import Catch22"
   ],
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Load data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:30:08.041533Z",
     "iopub.status.busy": "2020-12-19T14:30:08.041060Z",
     "iopub.status.idle": "2020-12-19T14:30:08.210768Z",
     "shell.execute_reply": "2020-12-19T14:30:08.211258Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(67, 1) (67,) (50, 1) (50,)\n",
      "(40, 6) (40,) (40, 6) (40,)\n"
     ]
    }
   ],
   "source": [
    "IPD_X_train, IPD_y_train = load_italy_power_demand(split=\"train\", return_X_y=True)\n",
    "IPD_X_test, IPD_y_test = load_italy_power_demand(split=\"test\", return_X_y=True)\n",
    "IPD_X_test = IPD_X_test[:50]\n",
    "IPD_y_test = IPD_y_test[:50]\n",
    "\n",
    "print(IPD_X_train.shape, IPD_y_train.shape, IPD_X_test.shape, IPD_y_test.shape)\n",
    "\n",
    "BM_X_train, BM_y_train = load_basic_motions(split=\"train\", return_X_y=True)\n",
    "BM_X_test, BM_y_test = load_basic_motions(split=\"test\", return_X_y=True)\n",
    "\n",
    "print(BM_X_train.shape, BM_y_train.shape, BM_X_test.shape, BM_y_test.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. catch22 transform\n",
    "\n",
    "### Univariate\n",
    "\n",
    "The catch22 features are provided in the form of a transformer, `Catch22`.\n",
    "From this the transformed data can be used for a variety of time series analysis tasks."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:30:08.215545Z",
     "iopub.status.busy": "2020-12-19T14:30:08.215049Z",
     "iopub.status.idle": "2020-12-19T14:30:08.222937Z",
     "shell.execute_reply": "2020-12-19T14:30:08.223422Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": "Catch22()"
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c22_uv = Catch22()\n",
    "c22_uv.fit(IPD_X_train, IPD_y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:30:08.226714Z",
     "iopub.status.busy": "2020-12-19T14:30:08.226142Z",
     "iopub.status.idle": "2020-12-19T14:30:08.252491Z",
     "shell.execute_reply": "2020-12-19T14:30:08.252970Z"
    },
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "         0         1   2   3         4         5         6     7         8   \\\n",
      "0  1.158650 -0.217228   3   6  0.485847  0.095361  1.000000   8.0  0.040000   \n",
      "1  0.918174 -0.214746   4   8  0.548270  0.071349  0.869565  15.0  0.111111   \n",
      "2 -0.273186 -0.085866   2   5  0.464109  0.216576  0.913043   3.0  0.034014   \n",
      "3  0.048404 -0.450092   4  10  0.609319  0.124926  0.869565  13.0  0.111111   \n",
      "4  0.426386 -0.450726   4   7  0.559022  0.054549  0.913043  16.0  0.111111   \n",
      "\n",
      "   9   ...        12        13        14        15   16        17   18   19  \\\n",
      "0   0  ...  0.750000  0.291667 -0.625000  0.468050  5.0  1.787502  0.0  0.0   \n",
      "1   0  ...  0.500000  0.208333 -0.666667  0.702777  5.0  1.730238  0.0  0.0   \n",
      "2   0  ...  0.666667  0.875000  0.250000  0.310570  5.0  1.730238  0.0  0.0   \n",
      "3   0  ...  0.666667  0.166667 -0.625000  0.804046  6.0  1.605420  0.0  0.0   \n",
      "4   0  ...  0.500000  0.291667 -0.666667  0.675482  6.0  1.730238  0.0  0.0   \n",
      "\n",
      "         20        21  \n",
      "0  0.589049  0.857423  \n",
      "1  0.196350  0.682608  \n",
      "2  0.589049  0.886426  \n",
      "3  0.196350  0.664320  \n",
      "4  0.196350  0.674197  \n",
      "\n",
      "[5 rows x 22 columns]\n"
     ]
    }
   ],
   "source": [
    "transformed_data_uv = c22_uv.transform(IPD_X_train)\n",
    "print(transformed_data_uv.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The transform `Catch22` method will process all 22 features.\n",
    "For individual features, the transform_single_feature method can be used when provided with a numeric feature ID or the feature name."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:30:08.257918Z",
     "iopub.status.busy": "2020-12-19T14:30:08.257341Z",
     "iopub.status.idle": "2020-12-19T14:30:08.260888Z",
     "shell.execute_reply": "2020-12-19T14:30:08.261385Z"
    },
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[3 4 2 4 4 3 3 4 4 4 4 4 4 4 4 3 4 4 4 4 4 4 4 4 4 3 4 4 4 4 4 3 4 4 3 4 4\n",
      " 3 4 4 3 4 4 3 4 4 3 4 3 3 4 3 4 3 3 4 4 4 3 2 4 4 4 4 4 4 2]\n"
     ]
    }
   ],
   "source": [
    "transformed_feature_uv = c22_uv._transform_single_feature(IPD_X_train, \"CO_f1ecac\")\n",
    "print(transformed_feature_uv)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Multivariate\n",
    "\n",
    "Transformation of multivariate data is supported by `Catch22`.\n",
    "The default procedure will concatenate each column prior to transformation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:30:08.264541Z",
     "iopub.status.busy": "2020-12-19T14:30:08.264050Z",
     "iopub.status.idle": "2020-12-19T14:30:08.266022Z",
     "shell.execute_reply": "2020-12-19T14:30:08.266517Z"
    },
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": "Catch22()"
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c22_mv = Catch22()\n",
    "c22_mv.fit(BM_X_train, BM_y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:30:08.271483Z",
     "iopub.status.busy": "2020-12-19T14:30:08.270986Z",
     "iopub.status.idle": "2020-12-19T14:30:08.413472Z",
     "shell.execute_reply": "2020-12-19T14:30:08.413974Z"
    },
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "         0         1   2   3         4         5         6     7         8   \\\n",
      "0 -0.417887  0.187460   2   5  0.105551 -0.649750  0.794658   9.0  0.002256   \n",
      "1  1.061005  0.294088   2   6  0.043438 -1.968673  0.626043  79.0  0.012387   \n",
      "2 -0.065333  0.297270   3   6  0.111391 -0.015163  0.808013  25.0  0.000808   \n",
      "3  0.274125 -0.130168   2   5  0.062411 -0.103788  0.841402  17.0  0.005236   \n",
      "4 -0.310012  0.068879   3   5  0.086561  0.321862  0.813022  26.0  0.012837   \n",
      "\n",
      "   9   ...        12        13        14        15   16        17        18  \\\n",
      "0   9  ...  0.666667 -0.576667 -0.421667  0.479256  7.0  1.952422  0.127660   \n",
      "1  10  ...  0.250000 -0.501667 -0.318333  0.603359  7.0  1.858507  0.170213   \n",
      "2  11  ...  1.000000 -0.446667 -0.470000  0.812749  8.0  1.927062  0.127660   \n",
      "3   8  ...  0.666667 -0.366667 -0.351667  0.486981  7.0  2.009502  0.127660   \n",
      "4  10  ...  0.750000 -0.501667 -0.695833  0.767016  7.0  1.869634  0.787234   \n",
      "\n",
      "         19        20        21  \n",
      "0  0.659574  0.632000  1.095149  \n",
      "1  0.872340  0.570641  0.987787  \n",
      "2  0.659574  0.533825  0.994098  \n",
      "3  0.638298  0.625864  1.038027  \n",
      "4  0.617021  0.570641  0.955538  \n",
      "\n",
      "[5 rows x 22 columns]\n"
     ]
    }
   ],
   "source": [
    "transformed_data_mv = c22_mv.transform(BM_X_train)\n",
    "print(transformed_data_mv.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. catch22 Forest Classifier\n",
    "\n",
    "For classification tasks the default classifier to use with the catch22 features is random forest classifier.\n",
    "An implementation making use of the `RandomForestClassifier` from sklearn built on catch22 features is provided in the form on the `Catch22ForestClassifier` for ease of use."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:30:08.431962Z",
     "iopub.status.busy": "2020-12-19T14:30:08.419431Z",
     "iopub.status.idle": "2020-12-19T14:30:08.535295Z",
     "shell.execute_reply": "2020-12-19T14:30:08.535836Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": "Catch22ForestClassifier(random_state=0)"
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c22f = Catch22ForestClassifier(n_estimators=100, random_state=0)\n",
    "c22f.fit(IPD_X_train, IPD_y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:30:08.553299Z",
     "iopub.status.busy": "2020-12-19T14:30:08.552508Z",
     "iopub.status.idle": "2020-12-19T14:30:08.561331Z",
     "shell.execute_reply": "2020-12-19T14:30:08.561821Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "C22F Accuracy: 0.84\n"
     ]
    }
   ],
   "source": [
    "c22f_preds = c22f.predict(IPD_X_test)\n",
    "print(\"C22F Accuracy: \" + str(metrics.accuracy_score(IPD_y_test, c22f_preds)))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "name": "python3",
   "language": "python",
   "display_name": "Python 3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
